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Abstract 

A version of the definition of intelligent behaviour will be supplied in the context 
of real and artificial systems. Short presentation of principles of learning, starting 
with Pavlovian ’s classical conditioning through reinforced response and operant 
conditioning of Thorndike and Skinner and finishing with cognitive learning of 
Tolman and Bandura will be given. The most important figures within 
behaviourism, especially those with contribution to AT, will be described. Some 
tools of artificial intelligence that act according to those principles will be 
presented. An attempt will be made to show when some simple rules for behaviour 
modifications can lead to a complex intelligent behaviour. 


1. Intelligence: Description 

It can be stated without any doubt that behaviourists have made a great contribution to the 
development of artificial intelligence. The evidence from the animal learning theory, especially 
the laws of learning discovered by behaviourists, has attracted researchers within artificial 
intelligence for many years and many models have been based on them. 

Intelligence is a complex and controversial concept, therefore it is very difficult to capture it by a 
simple definition. According to Jordan and Jordan [1] it is appropriate to regard intelligence as a 
concept we employ to describe actions of a certain quality. Two criteria should be used in this 
regard, namely, speed (i.e. how quickly an agent performs a particular task requiring mental 
ability) and power (i.e. the degree of difficulty of the tasks an agent can perform). On the other 
hand one can find another definition of intelligence expressed in term of an ability to perform 
cognitive processes. There are three fundamental cognitive processes: 1) abstraction, 2) learning, 
and 3) dealing with novelty. 

Intelligence has been given many definitions by prominent researchers in the field, for example, 
it has been defined as: 
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• A general ability which involves mainly the education of relations and correlates. 
(Spearman, 1904) [2] 

• The ability to judge well, to understand well, to reason well. (Binet and Simon, 1905) 

[3] 

• The capacity to form concepts and to grasp their significance. (Terman, 1916) [4] 

• The ability of individual to adapt adequately to relatively new situations in life. 

(Pintner, 1921) [5] 

• The power of good responses from the point of view of truth or fact. (Thurstone, 1921) 

[ 6 ] 

• The mental capacity to automatize information processing and to emit contextually 
appropriate behaviour in response to novelty; intelligence also includes 
metacomponents, performance components, and knowledge-acquisition components. 
(Sternberg, 1986) [7] 

There are two main approaches to describing intelligence: the psychometric approach and the 
information-processing. The psychometric approach focuses on measuring or quantifying 
cognitive factors or abilities that make up an intellectual performance. Those cognitive factors 
might include: verbal comprehension, memory ability, perceptual speed, and reasoning. The 
scholars who follow this approach either lump these factors together (lumpers) or split them 
apart (splitters). 

According to lumpers, intelligence involves a general unified capacity for reasoning, acquiring 
knowledge, and solving problems. The most well-known theory is Spearman’s two-factor theory 
[2]. Spearman proposed that intelligence consisted of two factors: a single general factor (g) and 
numerous specific factors (5). The performance in any test or task is a function of both g and s. 
The idea of general intelligence factors is behind using a single measure of intelligence, such as 
an IQ (intelligence quotient) score. 

In contrast to lumpers, splitters define intelligence as composed of many separate mental abilities 
that function more or less independently. According to well know Gardner’s multiple-factor 
theory [8], there are at least seven independent aspect of intelligence: verbal skills, math skills, 
spatial skills, movement skills, musical skills, insight about oneself, and insight about others. 
Gardner stated that understanding these aspects comes from studying person in his or her 
environment and not from results of IQ tests. 

The competitive approach to intelligence — information-processing approach — defines 
intelligence by analyzing the components of the cognitive processes that people use to solve 
problems. The well-known example of this approach is Sternberg’s triarchic theory [7]. 

Sternberg proposes that intelligence can be divided into three ways of gathering and processing 
information. The first uses analytical or logical thinking skills that are measured by traditional 
intelligence tests. The second uses problem-solving skills that require creative thinking, the 
ability to deal with novel situations, and the ability to learn from experience. The third is using 
practical thinking skills that help a person to adjust to and to cope with his or her sociocultural 
environment. 

Although experts provide us with many definitions of intelligence they tend to agree that 
intelligence is: (1) the capacity to learn from experience and (2) the capacity to adapt to one’s 
environment. 
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Not only defining intelligence but also its measurement it is a very controversial topic. The first 
systematic attempt to measure intelligence was made in the beginning of this century by Alfred 
Binet. Binet-Simon Intelligence Scale [3] contained questions that measured vocabulary, 
memory, common knowledge, and other cognitive abilities. Binet also introduced the concept of 
mental age, which became the base for computing intelligence quotient (IQ). According to him, 
mental age is a method of estimating a child’s intellectual progress by comparing the child’s 
score on an intelligence test to the scores of average children of the same age. Several years later 
Terman [4] proposed a formula to calculate IQ score. Intelligence quotient is computed by 
dividing a child’s mental age (MA), as measured by an intelligence test, by the child’s 
chronological age (CA) and multiplying the result by 100. Nowadays, the most widely used IQ 
test for adult are: Stanford-Binet tests and Wechsler Adult Intelligence Scale-Revised (WAIS-R) 
[9]. 


2. Artificial Intelligence 

Traditional AI and cognitive science proceed by developing computer models of mental, human- 
like, functions. As a consequence, intelligence in these disciplines is closely tied to computers, it 
can be understood in terms of computer programs. When input is provided, input is processed, 
and finally output is generated. Then by analogy the human brain is viewed in some sense as a 
very powerful computer, as a seat of intelligence (Pfeifer & Scheier, 1999), [10]. However, when 
researchers in AI started applying these ideas to build robots that interact with real world, they 
found that it was rather difficult to have robots doing good jobs with this view of intelligence. 

There are several frequent criticisms of classical AI: 

• Classical AI systems lack generalization capabilities: Complete systems cannot be 
made from studies of isolated modulus. 

• Classical AI systems lack robustness and cannot perform in real time, and run on 
sequential machines. 

• Classical AI systems are goal based and organized hierarchically; their processing is 
done centrally. 

• The real world differs from virtual ones: It has its own dynamics. The virtual world 
used in A. I. systems has states with complete information on them, they are static. 

• The frame problem appears, i.e. how can models of parts of the real world be kept in 
tune with the real world as it is changing, and how can systems determine which 
changes in the world are relevant to a given situation without having to test all possible 
changes. 

Many started looking for alternatives, and it was R. Brooks from MIT [11] who maintained that 
all of AI's ideas concerning thinking, logic, and problem solving were based on assumptions that 
come from our own introspection, from how we see ourselves. We have to focus on the 
interaction with the real world: Intelligent behaviour could be achieved using a large number of 
loosely coupled processes that function predominantly in an asynchronous, rather parallel way. 
This was an origin of his subsumption architecture. He called his new paradigm in the study of 
intelligence “behaviour-based robotics.” (cf. Arkin, 1998) [12]. Now one often refers to the field 
as embodied cognitive science. Subsumption is a method of decomposing a robot’s control 
architecture into a set of task- achieving behaviours or competences. One should add at this point 
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that the term behaviour is used here in two ways, in the first, more informal use, behaviour is the 
result of a system-environment interaction, while in the second, more technical sense, behaviour 
refers to internal structures, i.e. the particular layers of modules designed to generate particular 
behaviours (in the first sense). 

In the classical AI’s approach control architecture for mobile robots is functional decomposition. 
First information from different sensor systems is received and integrated into a central 
representation. Then internal processing takes place in which an environment model (world 
model) is built or updated together with planning of the next actions. (Here decisions concerning 
further actions are made.) The final stage is execution of some actions. Altogether such an 
appraisal leads to the sense-think-act cycle and the thinking act is split into a modelling and a 
planning activity. 

In the behaviour-based robotics the main role is played by a method of decomposing a control 
system of a robot into a set task-achieving behaviours (or competences). This was called by R. 
Brook the subsumption architecture, in which control architecture is build by incremental adding 
competences on top of each others. This method is contrasted with the classical AI’s functional 
decomposition. Implementations of such task- achieving behaviours are called layers: higher- 
level layers build and rely on lower-level ones; instead of a single information process from 
perception to world modelling and action, there are multiply paths, the layers that are active in 
parallel. A series of small subtasks of the robot's overall task are not controlled in a hierarchical, 
traditional way, since each layer can function relatively independently; the subsumption 
approach realizes the direct coupling between sensors and actuators, with only limited internal 
processing. In this way a direct influence of young behaviourists approach is manifested. 

In a modem encyclopaedia one can read that AI is an interdisciplinary field combining research 
and theory from cognitive psychology and computer sciences, and which is focused on the 
development of artificial system that display human-like thinking or “intelligence.” In other 
references AI is understood as any synthetic intelligence, i.e. the goal of the field of study in the 
above-described interdisciplinary domain. We will omit the term AI and use rather embodied AI 
to underline this new point of view. 


3. Behaviourism 

Behaviourism is considered as one of the major schools of thought in the history of psychology. 
This approach emphasizes the objective, scientific analysis of observable behaviours to the 
exclusion of consideration of unobservable mental processes. It studies how organisms learn new 
behaviours and change or modify existing behaviours in response to influence from their 
environments. The basic assumption of behaviourism is that learning is the most important factor 
in the development of human behaviour and the formation of personality. According to 
behaviourists learning is based on association between stimulus (S) and response (R) to it. 

John Watson (1878-1958) is usually regarded as the father of behaviourism. According to him 
psychology is a purely objective experimental branch of natural science. Its theoretical goal is 
the prediction and control of behaviour. His ideas, published in 1913 in paper titled “Psychology 
as a Behaviourist Views It” [13] marked the beginning of the behavioural approach in 
psychology. 
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Although the founding of behaviourism is usually linked with the name of John Watson, many of 
the basic principles had already been published before Watson’s time by a group of Russian 
researchers, in particular Ivan Petrovich Pavlov (1849-1936). In 1904 he won a Nobel Prize for 
his studies on the reflexes involved in digestion. But it was his discovery of conditioning, by 
which he made a considerable impact on the development not only psychology, but also AI [14]. 

The next person who made great contribution to the development of psychology and AI was 
Burrhus Frederic Skinner (1904-1990). He constructed a radical behaviourist theory in which 
behaviour is explained as the lawful result of environmental factors. Skinner is especially famous 
for the study of a form of learning known as operant conditioning [15]. 

The next three scientists who had a great impact on the development of behaviourism were 
Edward Lee Thorndike (1874-1949), Clark L. Hull (1884-1952) and Edward Chace Tolman 
(1886-1959). Their theories can be described as a ‘subjective’ behaviourism, because they 
moved away from the Skinner’s radical behaviourism and in their explanation their refer to 
certain processes which take place within the organism. 

Thorndike was particularly known for his extensive research into learning in animals and his 
attempt to develop a theoretical explanation for learning phenomena [16]. He initially described 
a form of learning know as trial-and-error learning or instrumental conditioning (Skinner used 
basically the same form of conditioning, but called it operant conditioning). 

Hull is credited with developing the first systematic theory of learning known as the drive 
reduction theory [17]. According to his theory, it is drive and need that motivate to behave in a 
particular way. 

According to Tolman [18], behaviour is largely regulated by cognitive factors such as the 
perception of signs and patterns in the environment, and the expectation of reward. Tolman can 
be regarded as a precursor of the social cognitive learning theory [19]. 

Social cognitive learning theory [20] agrees with other behaviouristically oriented theories in 
regarding behaviour as primarily learned and in focusing on the study of observable behaviour. 
However, there is a major difference because the social cognitive theory uses unobservable 
matters such as thoughts, expectations, and motivation in its explanation of behaviour. 

According to this school, the observational learning is the most important method of learning. 
Three psychologists, namely Julian Rotter, Albert Bandura and Walter Mischel are widely 
regarded as the most important figures in the development of social cognitive learning theory. 


4. Learning 

Learning can be defined as relatively permanent change in behaviour (both mental events and 
overt behaviours) that results from experience. Learning has taken place when a person or an 
animal has acquired knowledge of something that was previously unknown to him, or when he 
can do something he previously could not do. 
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The main types of learning that have been already identified and described can be classified on 
the basis of two criteria: 

• The degree of understanding a learner must have of what is being learnt; 

• The level of awareness on which learning takes place. 

Two approaches to study learning have been used: association learning and cognitive learning. 

In the association approach to learning, stimuli and responses are units on which the analysis of 
behavioural changes is based. The aim is to establish what the relationship is between a stimulus 
(S) and the human or animal organism’s response (R) to it. There are two main types of 
association learning: 

• Classical conditioning - 1. P. Pavlov (1927); 

• Operant conditioning - E.L. Thorndike (1913) [21] and B.F. Skinner (1969). 

4.1 Classical Conditioning 

Classical conditioning it is a kind of learning in which a neutral stimulus acquires the ability to 
produce a response that was originally produced by different stimulus. 

The Russian physiologist Ivan Pavlov (1848-1936) is the father of classical conditioning. Pavlov 
first discovered that reflex of salivation and the secretion of gastric juices in a dog occur not only 
when food is placed in the dog’s mouth, but also when the dog sees the food. He became 
interested in this phenomenon (Figure 1). In an experimental situation food was placed in a dog’s 
mouth. Salivation occurred — salivation is a natural, reflexive and thus non-leamed response to 
the stimulus (food). It occurred every time when food is given to the hungry dog. This response 
was named unconditioned response (UR). Pavlov then rang a bell close to the dog but as it was 
expected no salivation occurred. The sound of the bell is a neutral stimulus (NS). Fater, Pavlov 
rang a bell before putting food in the dog’s mouth. Salivation occurred. After a number of 
instances of hearing a bell paired with food, Pavlov again rang the bell, but he did not give food 
to the dog. Salivation occurred. In this situation salivation was elicited by the sound stimulus. 
Pavlov called this phenomenon a conditioned response (CR). The new S-R relationship (the 
relationship between the sound of the bell and salivation) is a consequence of the learned 
association between two stimuli (the bell and the food). 

During classical conditioning, a dog not only leams to salivate to a tone but also simultaneously 
learns a number of other things that Pavlov identified as being a part of the classical conditioning 
procedure. The most important of them are: generalization, discrimination, extinction, and 
spontaneous recovery. 
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Before conditioning 



NS (bell) 
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UR (salivation) 



During conditioning 



US (food) 


CS (bell) 


+ 


UR (salivation) 



After conditioning 



CS (bell) 


CR (salivation) 



Classical conditioning. 


Figure 1. The process of Pavlovian classical conditioning. After [1] 

Nowadays there are two explanations of classical conditioning: The traditional and modern. 

Pavlov’s traditional explanation is known as stimulus substitution. According to stimulus 
substitution, a bond or associations forms between the conditioned stimulus and unconditioned 
stimulus so that the conditioned stimulus eventually substituted for the unconditioned stimulus. 
Rescorla’s modern explanation [23] is called stimulus information. According to the information 
theory of classical conditioning, an organism leams a relationship between two stimuli such that 
the occurrence of one stimulus predicts the occurrence of another. 

4.2 Operant Conditioning (OC) 

Operant Conditioning [9] it ‘is a ki nd of learning in which the consequences that follow some 
behaviour increase or decrease the likelihood of that behaviour occurring in the future. In OC an 
organism (agent) acts or “operates” on the environment in order to change the likelihood of the 
response occurring again’ (p.214). 

The first steps in the development of operant conditioning are found in the work of Thorndike 
[21]. He formulated the law of effect, which stated that behaviours (goal-directed) followed by 
positive consequences are strengthened, while behaviours followed by negative consequences 
are weakened. Thorndike’s ideas were further developed and expanded by Skinner. In a typical 
Skinner experiment a hungry pigeon is placed in a Skinner box. The pigeon walks around in the 
box, pecking here and there. Eventually the pigeon pecks against the lighted window and food 
falls into the bowl. By pecking against the lighted window the pigeon “operates” on its 
environment. Therefore this response is called an operant response. The food is the reward or 
reinforcer, which reinforces the appropriate response and increases the likelihood that pigeon 
will perform that behaviour in the future. After the reinforcer is presented a number of times, 
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immediately upon the appropriate pecking response, the probability of the pecking is greater than 
any other response. The procedure of behavioural shaping can be used in conditioning the pigeon 
to pick against the window. During shaping, the experimenter reinforces behaviours that lead up 
to or approximate the desired behaviour. The progress of operant conditioning can be divided 
into two phases (Figure 2 and 3). 
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Conditioned 
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Conditioned 

operant 

response 
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stimulus 


Figure 3. The second phrase of operant conditioning. After [1]. 


The first phrase corresponds to trial and error learning, in the sense that the pigeon produces the 
correct response by accident. The second phrase relates to the maintenance of the accidentally 
discovered correct response in accordance with the principle of reinforcement. 
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Operant conditioning focuses very sharply on the manipulability of behaviour. By manipulating 
the environmental conditions under which learning takes place, control can be exercised over the 
type and strength of behaviour that is learned. The essentials of operant conditioning can be 
summarized in five words — consequences are contingent on behaviour. There are two kinds of 
consequences — reinforcement and punishment. Reinforcement is a consequence that increases 
the likelihood of a behaviour occurring again and punishment is a consequence that decreases the 
likelihood of a behaviour occurring again. In addition, reinforcement can be either positive or 
negative. Positive reinforcement is a pleasant stimulus that increases the likelihood of a response 
occurring again. Negative reinforcement is the removal of an unpleasant stimulus, thereby 
increasing the likelihood of a response occurring again. 

4.3 Cognitive Learning 

Cognitive learning cannot be explained on the basis of reinforcing conditions. It is a kind of 
learning that involves mental processes, such as attention and memory, and may not involve any 
external rewards or require the person to perform any observable behaviours. Learning through 
thinking does not exclude the principles of association, however, it is regarded as a conscious act 
of thinking. There are two main kinds of cognitive learning: 

• Sign/latent learning - E.C.Tolman (1932) [18], 

• Observational learning - A.Bandura (1986) [23]. 

4.3.1 Sign/Latent Learning 

According to Tolman, learning is attributed to the discovery of which response leads to what 
effect, and to a learned expectation that a certain stimulus will be followed by another stimulus. 
The stimuli are processed within the organism into an organized cognitive structure (cognitive 
map). The cognitive map is an organism’s perceptual impressions of a learning situation. The 
performance of a correct response is a product of cognitive processes. Tolman also showed that 
organisms can learn in an absence of reinforcement — incidental learning. Bandura has developed 
Tolman’ s ideas. 

4.3.2 Observational Learning 

Bandura is a father of observational learning. It is a form of learning that develops through 
watching and does not require the observer to perform any observable behaviour or receive a 
reinforces. There are four components of observational learning: acquisition, retention, 
performance and reinforcement. 

After describing the learning approaches the major differences between associative and cognitive 
learning are summarized now. Associative learning (behavioural approach) provides means of 
describing how a person or animal learns a series of correct or desired responses, it a kind of 
learning which demands little more than parrot-like repetitions under reinforcing conditions. 

Cognitive learning explains learning with understanding and insight. Learning situation and 
material are perceptually organized by the learner, and then he formulates concepts and rules, 
next he recognizes the information so obtained into new and significant patters of information. 


141 



As a summary of learning, following Balkenius [24], we could present the main explanations of 
biological (animal) learning. According to those explanations the animal learns: 

• Stimulus-response associations 

• Stimulus-approach associations 

• Place-approach associations 

• Response chain 

• Stimulus-approach chain 

• Place-approach associations 

• S-R-S’ associations 

• S-S’ associations 

However, we are not going to develop these aspects here. 


5. Models of Behaviourists in Al 

Robot learning in most cases is a kind an associative learning and differs widely, and ranges 
from the model of classical conditioning to reinforcement learning. Classical conditioning in 
robot learning is formed in the paradigm of unsupervised learning, for which learning rules are 
similar to Hebb rules or Kohonen ones. The Hebb rule [25] states that when a node (neuron) i 
repeatedly and persistently takes part in activating another node (neuron) j, then i-th neuron's 
efficiency in activating j-th neuron is increased. For example if a and a. are the activations of 
the neuron i and j, respectively, and • is the learning rate, while wy is the connection weight 

(efficiency weight) between the neuron i and j, then the weight change Any is 

Aw.. = • a a. (1) 

V 1 J V 7 

Hebbian learning has the advantage of being simple and based only on local communication 
between neurons: no central control is required. If a mobile robot, an agent, has been equipped 
with proximity and collision sensors as well as with a motor and wheels, the following so-called 
distributed adaptive control architecture can be interpreted in terms of classical conditioning 
(Figure 4). 
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motor output 



Figure 4. Distributed adaptive control architecture interpreted in terms of classical 
conditioning, after [10]. 


6. Distributed Adaptive Control 

If a robot hits an obstacle as illustrated in Figure 4, activating a collision sensor, it backs up a 
little, turns and moves forward. Each sensor is connected to a node in the neural network: the 
collision sensors to nodes in the collision layer, the proximity sensors to nodes in the proximity 
layer. Moreover, the proximity layer is fully connected to the collision layer in one direction, 
while the collision layer is connected additionally to a motor output layer. The latter connections 
implement the basic reflexes, i.e. motor responses, called (in the Pavlovian approach) 
unconditioned response (UCR). The conditioned stimuli (CS) are the activations of the proximity 
sensors and the activations of the collision sensors model the unconditioned stimuli (UCS); they 
cause the robot to turn away from obstacle. We see that the UCS's are connected to the UCR's. 

(In Figure 4 collision sensors shown from both sides correspond to two sides of robots: the left 
side and the right one. The UCS - UCR connections are different for different sides since in one 
case turn in left, in the other turn in right, are executed to omit obstacle.) Note that before 
conditioning activation of proximity sensors (NS) did not cause any response (no action of motor 
devices). The conditioned response (CR), should, according to the Pavlovian approach, be very 
much like UCR; in the present (robot) situation it is not only similar to the UCR but in fact 
identical. (Notice that in Pavlov'e experiments the neutral CS (bell) was paired with a UCS 
(food) that reliably produced UCR (salivation), and after some trials the bell was sufficient to 
produce salivation (the CR).) 

If at time t the robot hits an obstacle the corresponding node (neuron) in the collision layer is 
turned on, and simultaneously in several proximity nodes are activated. Then through Hebbian 
learning at the next time step t+1 (cf. Eq. (1)) the corresponding connections between the 

proximity nodes (CS) and the active collision node (UCS) (Aw ij= w ij (t+ 1 ) - w y (t ) are 
strengthened. (In Figure 4 the learning stage is schematically presented by two curved arrows 
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pointing the bottom of the robot- NS/CS node.) This means that next time more activation from 
the CS nodes will be propagated to the UCS node. (Notice that when a collision appears the UCS 
nodes activate the UCR nodes in the motor output layer, and the robot backs up and turns to omit 
the obstacle.) If nodes in the collision layer (UCS) are binary threshold then after several hints 
due to the Hebbian rule of learning the activation originating from the CS layer becomes strong 
enough to raise the UCS node above threshold without collision initiating the activation the robot 
motor to avoid obstacles. (On Figure 4 this is schematically represented by two broken arrows 
pointing the top of the robot — the UCR/CR node.) When this happens the robot has learned to 
avoid obstacles through principle of classical conditioning [10]. 

Unsupervised learning paradigm known as a Kohonen map (or network) is used by a robot for 
location recognition, for example during which it measures the time between turn actions and 
hence it learns to recognize a particular location by building simple internal representations of its 
environment by a process of self-organization (without using explicit world model). In contrast 
to the previous distributed adaptive control, learning in the Kohonen algorithm is not 
incremental. 

On the other hand operant learning in robot learning is formed in the paradigm of self- supervised 
learning. Here learning is based on reward (or punishment) resulting from behaviour. This is 
some-how similar to the operant conditioning of Thorndike and Skinner. Then two methods are 
distinguished, the first is that known in psychology as reinforcement learning, the other — as 
value-based learning. Value-based learning is learning modulated by a value system, which 
places some values on various types of sensory-actuator (motor) coordinations (i.e. value 
systems are activated only after an agent has performed behaviour. Both, however, methods 
employ principles of self-organization. The value system providing a kind of basic motivation 
for the agent guides the process of self-organization. 

The third approach to learning is just teaching, and then we face with supervised learning or 
error-directed learning. For artificial, autonomous agent, such as neural networks that are models 
of human model behaviour, the delta rule, or more general, the error back-propagation rule are 
examples of learning rules. Often one says that supervised learning seems to resemble the way a 
mother teaches her child: The child can use the teaching signal from the mother to adjust his 
(her) responses. It is done by means of a supervised learning scheme in which the feedback from 
the mother has to be translated into error signals. However, this translation implies rather 
complex perceptual problems. 


7. Conclusions 

Following the point of view of the authors of [10] we can state that the complex intelligent 
behaviour can be performed by complete system (agent). According to them it should satisfied 
the following conditions: 

• Complete system (agent) must possess the architecture: 

- with direct coupling of perception to action 

- with dynamic interaction with the environment 

- with intrinsic mechanisms to cope with resource limitations and incomplete 
knowledge 

- with decentralized processing 
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• Complete system has to be the autonomous agent (self-sufficient agent, equipped with 
the appropriate learning mechanism, with its own history, adaptive) 

• Complete system has to be the situated agent (it acquires information about its 
environment only through its sensors and interacts with the world on its own) 

• Complete system has to be embodied (it must interact with its environment, is 
continuously subjected to physical forces, to energy dissipation, to damage, to any 
influence in the environment) 

• Complete system is behaviour based, not goal based 

• Complete system includes sensors and effectors 

• Sensory signals (stimuli) should be mapped (relatively) directly to effect or motors 
(responses) 

• Complete system is equipped with a large number of parallel processes connected (only 
loosely) to one another 

This leads to embodied cognitive sciences and to embodied intelligence introduced by Rodney 
Brooks [1991] and the subsumption architecture. 

Since complete (i.e. intelligent) systems are behaviour based the behaviourists contributions are 
obvious. 
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